Richard Wen
rwen@ryerson.ca
First, we need to import the required libraries.
import calendar
import cufflinks as cf
import plotly
import plotly.graph_objects as go
import pandas as pd
from pandas.api.types import is_string_dtype
from sklearn.ensemble import RandomForestClassifier
Then, we have to configure cufflinks and plotly for offline notebook plotting. We also set the cufflinks plot theme and the number of columns to display when we present tables.
cf.go_offline()
cf.set_config_file(theme='white')
plotly.offline.init_notebook_mode()
pd.set_option('display.max_columns', None)
Finally, we apply some re-usable global settings for our notebook:
ksi_data_url: link to the Toronto Public Safety Data Portal .csv Killed or Seriously Injured (KSI) datamap_style: map style for base maps (see plotly mapbox styles)map_margins: margins for the map, we decrease this to give more space for the map (except for the top, which contains the title)ksi_data_url = 'https://opendata.arcgis.com/datasets/88e8040e02d5493eb163e454140d3a34_0.csv?outSR=%7B%22latestWkid%22%3A3857%2C%22wkid%22%3A102100%7D'
map_style = 'carto-positron'
map_margins = {'l': 0, 'r': 0, 'b': 0, 't': 35}
The data used in this experiment will involve the Killed or Seriously Injured (KSI) data available from the Toronto Police Public Safety Data Portal.
Here is a short summary from the glossary documentation of the data:
The Killed and Seriously Injured (KSI) data is a subset dataset from all traffic collision events. The source of the data comes from police reports where an officer attended an event related to a traffic collision.
Please note that this dataset does not include all traffic collision events. The KSI data only includes events where a person sustained a major or fatal injury in a traffic collision event.The following definitions relate to the severity of injury used to classify the events in this dataset.
- Major Injury: A non-fatal injury that is severe enough to require the injured person to be admitted to hospital, even if only for observation at the time of the collision. Includes: fracture, internal injury, severe cuts, crushing, burns, concussion, severe general shocks.
- Fatal: Fatal injury (person sustains bodily injuries resulting in death) only those cases where death occurs in less than 366 days as result of the collision. “Fatal” does not include death from natural causes (heart attack, stroke, epilated seizure, etc.) or suicide.
- Note: Other injury types including minor or none are associated to every individual included in the event.The KSI data includes a record (row) for every person involved in the collision event regardless of their level of injury, it includes everyone who was involved in a particular collision event. The field “Index” provides an arbitrary unique identification for every record in the entire dataset.
The “ACCNUM” is a unique identification for each traffic collision event. Since the data includes every person involved in a collision event, this identification is duplicated. Please note that this number is not unique and it may repeat year over year. Careful consideration must be made when creating a subset for unique events, as the detailed information provided is for every person involved and its associated role and information may be lost.
For example, the event with ACCNUM=6000607400 has 5 persons involved in the collision (5 records). The field “INVTYPE” indicates the role of the person in the collision event. The “INVAGE” indicates the age range of the person and the “INJURY” type indicates the level of injury they sustained. Therefore, this event can be interpreted in the following way:1. Passenger 1 age 20 to 24 sustained a fatal injury.2. Passenger 2 age 15-19 sustained a fatal injury.3. Passenger 3 age 20 to 24 sustained a major injury4. Driver age 1 20 to 24 sustained a major injury.5. Driver 2 age 45 to 49 sustained a major injury
First, we will try to download from a link ksi_data_url and read it, or use a saved copy of the data in our data folder data/ksi.csv if that does not work.
try:
ksi = pd.read_csv(ksi_data_url)
except:
ksi = pd.read_csv('data/ksi.csv')
ksi
| X | Y | Index_ | ACCNUM | YEAR | DATE | TIME | Hour | STREET1 | STREET2 | OFFSET | ROAD_CLASS | District | WardNum | WardNum_X | WardNum_Y | Division | Division_X | Division_Y | LATITUDE | LONGITUDE | LOCCOORD | ACCLOC | TRAFFCTL | VISIBILITY | LIGHT | RDSFCOND | ACCLASS | IMPACTYPE | INVTYPE | INVAGE | INJURY | FATAL_NO | INITDIR | VEHTYPE | MANOEUVER | DRIVACT | DRIVCOND | PEDTYPE | PEDACT | PEDCOND | CYCLISTYPE | CYCACT | CYCCOND | PEDESTRIAN | CYCLIST | AUTOMOBILE | MOTORCYCLE | TRUCK | TRSN_CITY_ | EMERG_VEH | PASSENGER | SPEEDING | AG_DRIV | REDLIGHT | ALCOHOL | DISABILITY | Hood_ID | Neighbourh | ObjectId | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -79.412438 | 43.767462 | 80221198 | 4003162994 | 2014 | 2014-10-24T04:00:00.000Z | 2315 | 23 | YONGE ST | HILLCREST AVE | Major Arterial | North York | 18 | 0 | 0 | 32 | 0 | 0 | 43.767462 | -79.412438 | Intersection | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Sideswipe | Passenger | unknown | None | 0 | Yes | Yes | Yes | 51 | Willowdale East (51) | 12001 | ||||||||||||||||||||||
| 1 | -79.516246 | 43.718318 | 80565670 | 6001093797 | 2016 | 2016-06-22T04:00:00.000Z | 2315 | 23 | 120 BEVERLY HILLS DR | 65 m South of | Collector | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.718318 | -79.516246 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Pedestrian Collisions | Driver | 25 to 29 | None | 0 | South | Automobile, Station Wagon | Going Ahead | Driving Properly | Normal | Yes | Yes | Yes | 26 | Downsview-Roding-CFB (26) | 12002 | |||||||||||||||||
| 2 | -79.516246 | 43.718318 | 80565671 | 6001093797 | 2016 | 2016-06-22T04:00:00.000Z | 2315 | 23 | 120 BEVERLY HILLS DR | 65 m South of | Collector | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.718318 | -79.516246 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Pedestrian Collisions | Passenger | 30 to 34 | None | 0 | Yes | Yes | Yes | 26 | Downsview-Roding-CFB (26) | 12003 | ||||||||||||||||||||||
| 3 | -79.516246 | 43.718318 | 80565672 | 6001093797 | 2016 | 2016-06-22T04:00:00.000Z | 2315 | 23 | 120 BEVERLY HILLS DR | 65 m South of | Collector | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.718318 | -79.516246 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Pedestrian Collisions | Pedestrian | 10 to 14 | Major | 0 | East | Vehicle hits the pedestrian walking or running... | Crossing, no Traffic Control | Inattentive | Yes | Yes | Yes | 26 | Downsview-Roding-CFB (26) | 12004 | ||||||||||||||||||
| 4 | -79.374309 | 43.662909 | 80632379 | 6002153175 | 2016 | 2016-12-04T05:00:00.000Z | 2315 | 23 | CARLTON STREET | HOMEWOOD AVENUE | Minor Arterial | Toronto and East York | 13 | 0 | 0 | 51 | 0 | 0 | 43.662909 | -79.374309 | Intersection | At Intersection | No Control | Rain | Dark, artificial | Wet | Non-Fatal Injury | Pedestrian Collisions | Driver | 75 to 79 | None | 0 | East | Automobile, Station Wagon | Turning Left | Failed to Yield Right of Way | Inattentive | Yes | Yes | Yes | 73 | Moss Park (73) | 12005 | |||||||||||||||||
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12239 | -79.330990 | 43.804445 | 5273577 | 1119725 | 2009 | 2009-07-26T04:00:00.000Z | 236 | 2 | 3330 PHARMACY Aven | Minor Arterial | Scarborough | 22 | 0 | 0 | 42 | 0 | 0 | 43.804445 | -79.330990 | No Control | Clear | Dark | Dry | Non-Fatal Injury | Pedestrian Collisions | Driver | unknown | None | 0 | South | Automobile, Station Wagon | Going Ahead | Yes | Yes | 116 | Steeles (116) | 996 | |||||||||||||||||||||||
| 12240 | -79.330990 | 43.804445 | 5273578 | 1119725 | 2009 | 2009-07-26T04:00:00.000Z | 236 | 2 | 3330 PHARMACY Aven | Minor Arterial | Scarborough | 22 | 0 | 0 | 42 | 0 | 0 | 43.804445 | -79.330990 | No Control | Clear | Dark | Dry | Non-Fatal Injury | Pedestrian Collisions | Pedestrian | unknown | Major | 0 | Other | Other | Yes | Yes | 116 | Steeles (116) | 997 | ||||||||||||||||||||||||
| 12241 | -79.330990 | 43.804445 | 5273579 | 1119725 | 2009 | 2009-07-26T04:00:00.000Z | 236 | 2 | 3330 PHARMACY Aven | Minor Arterial | Scarborough | 22 | 0 | 0 | 42 | 0 | 0 | 43.804445 | -79.330990 | No Control | Clear | Dark | Dry | Non-Fatal Injury | Pedestrian Collisions | Pedestrian | unknown | Minimal | 0 | Other | Other | Yes | Yes | 116 | Steeles (116) | 998 | ||||||||||||||||||||||||
| 12242 | -79.228359 | 43.791693 | 80205836 | 4001787575 | 2014 | 2014-03-29T04:00:00.000Z | 236 | 2 | 455 MILNER AVE | Minor Arterial | Scarborough | 23 | 0 | 0 | 42 | 0 | 0 | 43.791693 | -79.228359 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | SMV Other | Driver | 25 to 29 | Major | 0 | East | Delivery Van | Going Ahead | Lost control | Ability Impaired, Alcohol | Yes | Yes | Yes | 132 | Malvern (132) | 999 | ||||||||||||||||||
| 12243 | -79.524242 | 43.755858 | 80927971 | 7003085452 | 2017 | 2017-11-28T05:00:00.000Z | 558 | 5 | FINCH AVE W | NORFINCH DR | Major Arterial | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.755858 | -79.524242 | Intersection | At Intersection | Traffic Signal | Clear | Dark, artificial | Dry | Non-Fatal Injury | Turning Movement | Driver | 20 to 24 | None | 0 | West | Automobile, Station Wagon | Going Ahead | Driving Properly | Normal | Yes | Yes | Yes | 25 | Glenfield-Jane Heights (25) | 1000 |
12244 rows × 60 columns
Notice here that there a few columns with 'Yes' values in them, and that the DATE column is in text only. We will need to preprocess these for our analyses.
The date is in ISO 8601 UTC format, but needs to be converted into a date time object using pd.to_datetime.
ksi.DATE = pd.to_datetime(ksi.DATE)
ksi.DATE
0 2014-10-24 04:00:00+00:00
1 2016-06-22 04:00:00+00:00
2 2016-06-22 04:00:00+00:00
3 2016-06-22 04:00:00+00:00
4 2016-12-04 05:00:00+00:00
...
12239 2009-07-26 04:00:00+00:00
12240 2009-07-26 04:00:00+00:00
12241 2009-07-26 04:00:00+00:00
12242 2014-03-29 04:00:00+00:00
12243 2017-11-28 05:00:00+00:00
Name: DATE, Length: 12244, dtype: datetime64[ns, UTC]
Next we should convert some of the variables with Yes to 1 (Yes) and 0 (No) so we can obtain counts for each variable, and perform numerical computation.
# Get the columns with 'Yes' in it
ksi_str_columns = [c for c in ksi.columns if is_string_dtype(ksi[c])]
ksi_yes_columns = [c for c in ksi_str_columns if 'Yes' in ksi[c].values]
# If there are any 'Yes' columns, convert them to 1 and 0
if len(ksi_yes_columns) > 0:
ksi[ksi_yes_columns] = ksi[ksi_yes_columns].apply(lambda c: [1 if r == 'Yes' else 0 for r in c])
ksi
| X | Y | Index_ | ACCNUM | YEAR | DATE | TIME | Hour | STREET1 | STREET2 | OFFSET | ROAD_CLASS | District | WardNum | WardNum_X | WardNum_Y | Division | Division_X | Division_Y | LATITUDE | LONGITUDE | LOCCOORD | ACCLOC | TRAFFCTL | VISIBILITY | LIGHT | RDSFCOND | ACCLASS | IMPACTYPE | INVTYPE | INVAGE | INJURY | FATAL_NO | INITDIR | VEHTYPE | MANOEUVER | DRIVACT | DRIVCOND | PEDTYPE | PEDACT | PEDCOND | CYCLISTYPE | CYCACT | CYCCOND | PEDESTRIAN | CYCLIST | AUTOMOBILE | MOTORCYCLE | TRUCK | TRSN_CITY_ | EMERG_VEH | PASSENGER | SPEEDING | AG_DRIV | REDLIGHT | ALCOHOL | DISABILITY | Hood_ID | Neighbourh | ObjectId | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -79.412438 | 43.767462 | 80221198 | 4003162994 | 2014 | 2014-10-24 04:00:00+00:00 | 2315 | 23 | YONGE ST | HILLCREST AVE | Major Arterial | North York | 18 | 0 | 0 | 32 | 0 | 0 | 43.767462 | -79.412438 | Intersection | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Sideswipe | Passenger | unknown | None | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 51 | Willowdale East (51) | 12001 | ||||||||||||
| 1 | -79.516246 | 43.718318 | 80565670 | 6001093797 | 2016 | 2016-06-22 04:00:00+00:00 | 2315 | 23 | 120 BEVERLY HILLS DR | 65 m South of | Collector | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.718318 | -79.516246 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Pedestrian Collisions | Driver | 25 to 29 | None | 0 | South | Automobile, Station Wagon | Going Ahead | Driving Properly | Normal | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 26 | Downsview-Roding-CFB (26) | 12002 | |||||||
| 2 | -79.516246 | 43.718318 | 80565671 | 6001093797 | 2016 | 2016-06-22 04:00:00+00:00 | 2315 | 23 | 120 BEVERLY HILLS DR | 65 m South of | Collector | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.718318 | -79.516246 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Pedestrian Collisions | Passenger | 30 to 34 | None | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 26 | Downsview-Roding-CFB (26) | 12003 | ||||||||||||
| 3 | -79.516246 | 43.718318 | 80565672 | 6001093797 | 2016 | 2016-06-22 04:00:00+00:00 | 2315 | 23 | 120 BEVERLY HILLS DR | 65 m South of | Collector | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.718318 | -79.516246 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | Pedestrian Collisions | Pedestrian | 10 to 14 | Major | 0 | East | Vehicle hits the pedestrian walking or running... | Crossing, no Traffic Control | Inattentive | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 26 | Downsview-Roding-CFB (26) | 12004 | ||||||||
| 4 | -79.374309 | 43.662909 | 80632379 | 6002153175 | 2016 | 2016-12-04 05:00:00+00:00 | 2315 | 23 | CARLTON STREET | HOMEWOOD AVENUE | Minor Arterial | Toronto and East York | 13 | 0 | 0 | 51 | 0 | 0 | 43.662909 | -79.374309 | Intersection | At Intersection | No Control | Rain | Dark, artificial | Wet | Non-Fatal Injury | Pedestrian Collisions | Driver | 75 to 79 | None | 0 | East | Automobile, Station Wagon | Turning Left | Failed to Yield Right of Way | Inattentive | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 73 | Moss Park (73) | 12005 | |||||||
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12239 | -79.330990 | 43.804445 | 5273577 | 1119725 | 2009 | 2009-07-26 04:00:00+00:00 | 236 | 2 | 3330 PHARMACY Aven | Minor Arterial | Scarborough | 22 | 0 | 0 | 42 | 0 | 0 | 43.804445 | -79.330990 | No Control | Clear | Dark | Dry | Non-Fatal Injury | Pedestrian Collisions | Driver | unknown | None | 0 | South | Automobile, Station Wagon | Going Ahead | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | Steeles (116) | 996 | ||||||||||||
| 12240 | -79.330990 | 43.804445 | 5273578 | 1119725 | 2009 | 2009-07-26 04:00:00+00:00 | 236 | 2 | 3330 PHARMACY Aven | Minor Arterial | Scarborough | 22 | 0 | 0 | 42 | 0 | 0 | 43.804445 | -79.330990 | No Control | Clear | Dark | Dry | Non-Fatal Injury | Pedestrian Collisions | Pedestrian | unknown | Major | 0 | Other | Other | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | Steeles (116) | 997 | |||||||||||||
| 12241 | -79.330990 | 43.804445 | 5273579 | 1119725 | 2009 | 2009-07-26 04:00:00+00:00 | 236 | 2 | 3330 PHARMACY Aven | Minor Arterial | Scarborough | 22 | 0 | 0 | 42 | 0 | 0 | 43.804445 | -79.330990 | No Control | Clear | Dark | Dry | Non-Fatal Injury | Pedestrian Collisions | Pedestrian | unknown | Minimal | 0 | Other | Other | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | Steeles (116) | 998 | |||||||||||||
| 12242 | -79.228359 | 43.791693 | 80205836 | 4001787575 | 2014 | 2014-03-29 04:00:00+00:00 | 236 | 2 | 455 MILNER AVE | Minor Arterial | Scarborough | 23 | 0 | 0 | 42 | 0 | 0 | 43.791693 | -79.228359 | Mid-Block | Non Intersection | No Control | Clear | Dark, artificial | Dry | Non-Fatal Injury | SMV Other | Driver | 25 to 29 | Major | 0 | East | Delivery Van | Going Ahead | Lost control | Ability Impaired, Alcohol | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 132 | Malvern (132) | 999 | ||||||||
| 12243 | -79.524242 | 43.755858 | 80927971 | 7003085452 | 2017 | 2017-11-28 05:00:00+00:00 | 558 | 5 | FINCH AVE W | NORFINCH DR | Major Arterial | Etobicoke York | 7 | 0 | 0 | 31 | 0 | 0 | 43.755858 | -79.524242 | Intersection | At Intersection | Traffic Signal | Clear | Dark, artificial | Dry | Non-Fatal Injury | Turning Movement | Driver | 20 to 24 | None | 0 | West | Automobile, Station Wagon | Going Ahead | Driving Properly | Normal | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 25 | Glenfield-Jane Heights (25) | 1000 |
12244 rows × 60 columns
# Initial summary stats
ksi_summary = ksi.describe()
# Get the sum for all summary columns
ksi_sum = ksi[ksi_summary.columns].sum()
ksi_sum = ksi_sum.rename('sum')
# Add the sum to the summary stats
ksi_summary = ksi_summary.append(ksi_sum)
ksi_summary
| X | Y | Index_ | ACCNUM | YEAR | TIME | Hour | WardNum | WardNum_X | WardNum_Y | Division_X | Division_Y | LATITUDE | LONGITUDE | FATAL_NO | PEDESTRIAN | CYCLIST | AUTOMOBILE | MOTORCYCLE | TRUCK | TRSN_CITY_ | EMERG_VEH | PASSENGER | SPEEDING | AG_DRIV | REDLIGHT | ALCOHOL | DISABILITY | Hood_ID | ObjectId | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 12244.000000 | 12244.000000 | 1.224400e+04 | 1.224400e+04 | 1.224400e+04 | 1.224400e+04 | 12244.000000 | 1.224400e+04 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 12244.000000 | 1.224400e+04 |
| mean | -79.396212 | 43.710748 | 3.587528e+07 | 2.370242e+09 | 2.012689e+03 | 1.352408e+03 | 13.243711 | 3.506657e+03 | 2.047615 | 1.729173 | 4.121937 | 3.339268 | 43.710748 | -79.396212 | 1.471905 | 0.408200 | 0.109523 | 0.902728 | 0.085920 | 0.058968 | 0.066808 | 0.002123 | 0.367037 | 0.166204 | 0.516498 | 0.078324 | 0.039775 | 0.027769 | 73.352499 | 6.122500e+03 |
| std | 0.103606 | 0.056192 | 3.625811e+07 | 3.074230e+09 | 3.136108e+00 | 6.249500e+02 | 6.257227 | 2.194788e+05 | 5.642203 | 4.947143 | 13.208229 | 11.146807 | 0.056192 | 0.103606 | 7.595429 | 0.491521 | 0.312307 | 0.296340 | 0.280257 | 0.235574 | 0.249700 | 0.046034 | 0.482016 | 0.372279 | 0.499748 | 0.268692 | 0.195437 | 0.164316 | 41.372891 | 3.534683e+03 |
| min | -79.638390 | 43.592047 | 0.000000e+00 | 1.284070e+05 | 2.008000e+03 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 43.592047 | -79.638390 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000e+00 |
| 25% | -79.468615 | 43.662445 | 6.176591e+06 | 1.180965e+06 | 2.010000e+03 | 9.200000e+02 | 9.000000 | 7.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 43.662445 | -79.468615 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 38.000000 | 3.061750e+03 |
| 50% | -79.397290 | 43.702246 | 7.559770e+06 | 1.335254e+06 | 2.012000e+03 | 1.440000e+03 | 14.000000 | 1.300000e+01 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 43.702246 | -79.397290 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 77.000000 | 6.122500e+03 |
| 75% | -79.319248 | 43.756827 | 8.054227e+07 | 5.002033e+09 | 2.015000e+03 | 1.838000e+03 | 18.000000 | 2.200000e+01 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 43.756827 | -79.319248 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 111.000000 | 9.183250e+03 |
| max | -79.125896 | 43.855445 | 8.109988e+07 | 8.008069e+09 | 2.018000e+03 | 2.359000e+03 | 23.000000 | 1.716222e+07 | 25.000000 | 24.000000 | 55.000000 | 55.000000 | 43.855445 | -79.125896 | 78.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 140.000000 | 1.224400e+04 |
| sum | -972127.222223 | 535194.398736 | 4.392569e+11 | 2.902125e+13 | 2.464336e+07 | 1.655888e+07 | 162156.000000 | 4.293551e+07 | 25071.000000 | 21172.000000 | 50469.000000 | 40886.000000 | 535194.398736 | -972127.222223 | 18022.000000 | 4998.000000 | 1341.000000 | 11053.000000 | 1052.000000 | 722.000000 | 818.000000 | 26.000000 | 4494.000000 | 2035.000000 | 6324.000000 | 959.000000 | 487.000000 | 340.000000 | 898128.000000 | 7.496389e+07 |
Here are a few things to note:
YEARTIME are from 0 to 2359 (24-hour clock) similar to the Hour (0 to 23)X and Y and LONGITUDE and LATITUDE are the coordinates of the collisions inside TorontoPEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE are interesting as they define the type of collision based on the method of transporationWith the information above, we can start with plotting the number of collisions per year.
group_by_year = pd.Grouper(key = 'DATE', freq = 'Y')
ksi_yearly = ksi.groupby(group_by_year).DATE.count()
ksi_yearly.iplot(title = 'KSI Collisions Per Year')
The plot above shows us that there was a noticeably sharp decline in the number of KSI collisions from 2013 to 2015, while the collisions seem to slowly rise again afterwards. There is also a sharp increase from 2012 to 2013.
I wonder if there are some government road safety, vehicle, or driver behaviours in Toronto that are causing these sharp increases and decreases.
Next let's look at the distribution of collisions by transportation type or vehicular involvement.
# Get the collision types defined by the method of transportation or vehicular involvement
ksi_types_columns = ['PEDESTRIAN', 'CYCLIST', 'AUTOMOBILE', 'MOTORCYCLE', 'TRUCK', 'TRSN_CITY_', 'EMERG_VEH']
ksi_types_columns = [c for c in ksi_types_columns if c in ksi.columns]
# Get the sums for the columns
ksi_types = pd.DataFrame({
'collision_type': ksi_types_columns,
'collisions': ksi_summary.loc['sum', ksi_types_columns]
})
ksi_types = ksi_types.sort_values(by = 'collisions', ascending = False)
# Plot the number of collisions per type
ksi_types.iplot(kind = 'bar', x = 'collision_type', y = 'collisions', title = 'KSI Collisions By Type')
We can see that most collisions involve automobiles, followed by pedestrians, while other types of collisions occur much less frequently.
Let's also check them by year to see the changes over time.
ksi_types_year = ksi.groupby('YEAR')[ksi_types_columns].sum()
ksi_types_year.iplot(title = 'KSI Collisions Per Year By Type')
Looks like automobile and pedestrian collisions still account for most of the collisions every year, while the other types remain relatively stable.
Note: We also have to keep in mind that automobiles will likely be involved in most if not all severe collisions involving injury (due to it being the most available and used mode of transportation), which means that pedestrians are involved in a relatively large portion of severe collisions (being more vulnerable on the road).
There might be a monthly pattern that can be seen for each collision type. We can quickly check this by plotting the number of collisions per month for each type.
# Aggregate the KSI data by month
ksi_month = ksi[ksi_types_columns]
ksi_month['MONTH'] = ksi.DATE.dt.month
ksi_month = ksi_month.groupby('MONTH').sum()
# Sort values by month
ksi_month = ksi_month.sort_values(by = 'MONTH')
ksi_month_index = ksi_month.index.to_series().apply(lambda x: calendar.month_abbr[x])
ksi_month = ksi_month.set_index(ksi_month_index)
# Plot the collision type by month
ksi_month.iplot(
title = 'KSI Collisions By Type and Month',
subplots = True,
subplot_titles = True
)
A few notes for the plots above:
PEDESTRIAN collisions occur throughout all months with some slightly higher frequencies from September to October, and May to JuneCYCLIST collisions generally occur after April and start to lower after October (maybe due to winter season)AUTOMOBILE collisions are generally stable, but there is a slight rise from April onwardsMOTORCYCLE collisions seem to occur between April and October mostly, where collisions start to increase after March, and lower after October (likely due to season and weather conditions on the road)TRUCK collisions fluctuate with the months of March, August, September, and October being relatively higher than the restTRSN_CITY_ or city transportation vehicle collisions fluctuate similarly to TRUCK collisions, where higher frequencies are in January, June, July, and AugustEMERG_VEH or emergency vehicle collisions are relative rare and do not occur in January to March, May to June, and in November (make sense since all road users and traffic lights have to make way for emergency vehicles)CYCLIST and AUTOMOBILE collisions seem to have similar patterns monthly, where collisions are high starting in May and and then dropping off near OctoberFinally, there might be particular days of the week where certain types of collisions occur more.
# Aggregate the KSI data by day of week
ksi_day = ksi[ksi_types_columns]
ksi_day['DAY'] = ksi.DATE.dt.weekday
ksi_day = ksi_day.groupby('DAY').sum()
# Sort values by day
ksi_day = ksi_day.sort_values(by = 'DAY')
ksi_day_index = ksi_day.index.to_series().apply(lambda x: calendar.day_abbr[x])
ksi_day = ksi_day.set_index(ksi_day_index)
# Plot the collision type by day of week
ksi_day.iplot(
title = 'KSI Collisions By Type and Day of Week',
subplots = True,
subplot_titles = True
)
There are a few interesting things to note here:
PEDESTRIAN, TRUCK and CYCLIST collisions have some similar trends throughout the weekdays, although they differ monthly as seen in the previous sectionCYCLIST and TRSN_CITY_ public transporation collisions have a very similar trend (are cyclists getting hit by city vehicles?)Sat and Sun) have less collisions in general except for EMERG_VEH collisions, which only have a total 6 on Sunday from 2008 to 2018Thu and Friday Fri have relatively higher number of PEDESTRIAN and CYCLIST collisionsFri generally have a high number of collisions for all types except EMERG_VEHSat (maybe most motorcyclists like riding or are unlucky on Saturdays?)Next, we map the data using a density heat map with the LONGITUDE and LATITUDE coordinates to get an idea of what the spatial distribution is like.
# Get the latitude and longitude min/max ranges
ksi_lon_min, ksi_lon_max = ksi.describe().LONGITUDE[['min', 'max']]
ksi_lat_min, ksi_lat_max = ksi.describe().LATITUDE[['min', 'max']]
# Calculate the latitude and longitude mid points
ksi_lat_mid = (ksi_lat_max + ksi_lat_min) / 2
ksi_lon_mid = (ksi_lon_max + ksi_lon_min) / 2
# Create the map
ksi_map = go.Figure(go.Densitymapbox(
lat=ksi.LATITUDE,
lon=ksi.LONGITUDE,
text = ksi,
radius = 3
))
ksi_map.update_layout(margin = map_margins)
# Update the map style and view positions
ksi_map.update_layout(
title = 'KSI Collision Density, Toronto, ON',
mapbox_style = map_style,
mapbox_center_lat = ksi_lat_mid,
mapbox_center_lon = ksi_lon_mid,
mapbox_zoom = 10
)
ksi_map.show()
We can see that there are more traffic accidents near the center and north-western portions of downtown Toronto. Collisions are then sparsely distributed in areas outside of the downtown core.
Build multi-output models for traffic crash coordinates.
forest = RandomForestClassifier(n_estimators=100, random_state=1)